Speaker adaptation based on regularized speaker-dependent eigenphone matrix estimation

نویسندگان

  • Wen-Lin Zhang
  • Wei-Qiang Zhang
  • Dan Qu
  • Bi-Cheng Li
چکیده

Eigenphone-based speaker adaptation outperforms conventional maximum likelihood linear regression (MLLR) and eigenvoice methods when there is sufficient adaptation data. However, it suffers from severe over-fitting when only a few seconds of adaptation data are provided. In this paper, various regularization methods are investigated to obtain a more robust speaker-dependent eigenphone matrix estimation. Element-wise l1 norm regularization (known as lasso) encourages the eigenphone matrix to be sparse, which reduces the number of effective free parameters and improves generalization. Squared l2 norm regularization promotes an element-wise shrinkage of the estimated matrix towards zero, thus alleviating over-fitting. Column-wise unsquared l2 norm regularization (known as group lasso) acts like the lasso at the column level, encouraging column sparsity in the eigenphone matrix, i.e., preferring an eigenphone matrix with many zero columns as solution. Each column corresponds to an eigenphone, which is a basis vector of the phone variation subspace. Thus, group lasso tries to prevent the dimensionality of the subspace from growing beyond what is necessary. For nonzero columns, group lasso acts like a squared l2 norm regularization with an adaptive weighting factor at the column level. Two combinations of these methods are also investigated, namely elastic net (applying l1 and squared l2 norms simultaneously) and sparse group lasso (applying l1 and column-wise unsquared l2 norms simultaneously). Furthermore, a simplified method for estimating the eigenphone matrix in case of diagonal covariance matrices is derived, and a unified framework for solving various regularized matrix estimation problems is presented. Experimental results show that these methods improve the adaptation performance substantially, especially when the amount of adaptation data is limited. The best results are obtained when using the sparse group lasso method, which combines the advantages of both the lasso and group lasso methods. Using speaker-adaptive training, performance can be further improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker adaptation based on sparse and low-rank eigenphone matrix estimation

The eigenphone based speaker adaptation outperforms the conventional MLLR and eigenvoice methods when the adaptation data is sufficient, but it suffers from severe over-fitting when the adaptation data is limited. In this paper, l1 and nuclear norm regularization are applied simultaneously to obtain a more robust eigenphone estimation, resulting in a sparse and low-rank eigenphone matrix. The s...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Regularized-MLLR speaker adaptation for computer-assisted language learning system

In this paper, we propose a novel speaker adaptation technique, regularized-MLLR, for Computer Assisted Language Learning (CALL) systems. This method uses a linear combination of a group of teachers’ transformation matrices to represent each target learner’s transformation matrix, thus avoids the over-adaptation problem that erroneous pronunciations come to be judged as good pronunciations afte...

متن کامل

A Study on Speaker-Adaptive Speech Recognition

Speaker-independent system is desirable in many applications where speaker-specific data do not exist. However, if speakerdependent data are available, the system could be adapted to the specific speaker such that the error rate could be significantly reduced. In this paper, DARPA Resource Management task is used as the domain to investigate the performance of speaker-adaptive speech recognitio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Audio, Speech and Music Processing

دوره 2014  شماره 

صفحات  -

تاریخ انتشار 2014